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Abstract 

This paper solves an ‘incremental’ form of the sen- 
sitivity equations derived by differentiating the dis- 
cretized thin-layer Navier Stokes equations with re- 
spect to certain design variables of interest. The 
equations are solved with a parallel , preconditioned 
Generalized Minimal RESidual (GMRES) solver on 
a distributed-memory architecture. The ‘serial’ sen- 
sitivity analysis code is parallelized by using the 
Single Program Multiple Data (SPMD) program- 
ming model, domain-decomposition techniques, and 
message-passing tools. Sensitivity derivatives are 
computed for low and high Reynolds number flows 
over a NACA 1406 airfoil on a 32-processor Intel 
Hypercube, and found to be identical to those com- 
puted on a single-processor Cray Y-MP. It is esti- 
mated that the parallel sensitivity analysis code has 
to be run on 40—50 processors of the Intel Hyper- 
cube in order to match the single-processor process- 
ing time of a Cray Y-MP. 

Introduction 

The development of parallel computers with a 
large number of processing nodes offers tremendous 
increases in computer resources for computational 
scientists. Parallel computers afford the realistic 
possibility of combining the knowledge of diverse 
engineering disciplines to produce an integrated, 
multi-disciplinary effort to solve complex problems 
in engineering design. 

The High Speed Civil Transport (HSCT) ob- 
jectives of the High Performance Computing and 
Communications (HPCC) program 1 depend heavily 
on the design of methodologies for multi-disciplinary 
analysis and design. Massively parallel implemen- 
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tations of such design methodologies would be re- 
quired to ensure improved computational efficiency 
and tractability for large problems. The research 
in this paper is chiefly motivated by this need for 
the development of parallel, multi-disciplinary de- 
sign techniques. 

The field of Computational Fluid Dynamics 
(CFD) has matured sufficiently to the point that 
flow solutions provided by advanced CFD codes can 
be reliably used to extract information for use in 
aerospace vehicle design. A typical CFD code solves 
a system of partial differential equations (PDEs) 
on a discrete mesh, for a given, fixed, set of flow 
conditions (like Mach number, Reynolds’ number 
etc.). However, practical design codes usually re- 
quire much additional information in the form of 
Sensitivity Derivatives (SD), in order to produce an 
optimum design. 

A particular CFD code may be extended to cal- 
culate aerodynamic sensitivity derivatives which are 
consistent with the discrete flow solution provided 
by the code. Each sensitivity derivative quantifies 
the derivative of a system response (e.g. lift on an 
airfoil) with respect to an independent design vari- 
able (e.g. thickness of the airfoil). A large number of 
sensitivity derivatives may be required to evaluate 
the relative influence of all the parameters which 
affect the vehicle design. Hence, it becomes criti- 
cal that these sensitivity derivatives be computed 
‘cheaply’, if CFD codes are to be incorporated into 
practical multi-disciplinary design environments. 

The simplest way of calculating sensitivity 
derivatives is by computing the ‘difference’ between 
the two converged CFD solutions which correspond 
to two different values for the design variable of in- 
terest. This method is referred to as the ‘finite- 
difference’ method of calculating sensitivity deriva- 
tives. Although simple and straightforward, this 
method can become prohibitively expensive in terms 
of computational cost, particularly if the number of 
design variables of interest is large. 

The sensitivity derivatives can also be com* 
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puted by differentiating the governing equations of 
fluid flow ; the differentiation can be undertaken 
prior to the numerical discretization (continuum 
method) or after the numerical discretization (dis- 
crete method). Both of these methods are com- 
putationally efficient as compared to the ‘finite- 
difference ’ method, and have been compared by 
Shubin and Frank 2 . The linear system of equations 
obtained by this differentiation of the governing 
equations results in the Sensitivity Equations 3 * 4 ' 5 . 

This research compares the ‘finite-difference’ 
method and the discrete method for calculating sen- 
sitivity derivatives on a distributed-memory ma- 
chine. The sensitivity equations for the discrete ap- 
proach are derived by direct differentiation of the 
system of discrete non-linear algebraic equations 
which model the thin-layer Navier-Stokes (TLNS) 
equations for 2-D steady flow 3 . The sensitivity 
equations represent a large system of coupled, lin- 
ear, algebraic equations, which must be solved to 
yield the sensitivity derivatives of interest. 

The coefficient matrix corresponding to the lin- 
ear system is large and sparse, and usually reflects 
poor diagonal dominance. The large size of the ma- 
trix rules out solution by direct matrix inversion, as 
this would require prohibitively large core memory 
(particularly for 3-D problems). In ‘standard’ form, 
the discrete sensitivity equations must be solved ‘as 
is’ ; the method admits no approximations to the co- 
efficient matrix. Thus, standard iterative methods 
may converge very slowly, or may fail due to the 
lack of diagonal dominance (or poor conditioning) 
of the coefficient matrix. The ‘incremental’ form of 
the sensitivity equations developed by Korivi et. al 6 
admits ‘approximations of convenience’ to the co- 
efficient matrix, and thus overcomes the problems 
posed in solving the ‘standard’ sensitivity equations. 

The ‘incremental’ form admits approximations 
to improve the diagonal dominance of the coeffi- 
cient matrix (e.g. a time-term may be added to 
the main-diagonal, certain off-diagonal terms may 
be neglected etc.). This implies that quasi-Newton 
iterative methods (which exist in most CFD codes) 
can be used to solve the discrete sensitivity equa- 
tions. This also implies that parallel solvers devel- 
oped for iterative solutions of large, sparse linear 
systems of equations 7 ' 8 * 9 , can be applied without 
modification to solve the system of sensitivity equa- 
tions of interest. The existing literature pertaining 
to parallel sensitivity analysis seems to be extremely 
sparse. Some efforts in this direction are being made 
by Das et. al 10 and Olander et. al 11 . 

The particular quasi- Newton iterative method 


used in this paper is based on a preconditioned 
GMRES 12 solver. This solver has been success- 
fully parallelized and validated for the original CFD 
code used in this research, in a message- passing 
environment, on a distributed-memory machine 13 . 
A domain-decomposition strategy has been used to 
partition the original problem amongst all available 
processors. The parallel sensitivity analysis code 
thus developed is validated on a 32-processor Intel 
Hypercube. 

This paper is organized into four sections. This 
introduction section is followed by a discussion of 
(a) the analysis code — the governing equations, 
spatial discretizations and implicit formulation, (b) 
sensitivity equations in ‘standard’ and ‘incremen- 
tal’ form, (c) parallel computing issues related to 
the parallelization of (a) and (b). This is followed 
by computational results and comparisons of ‘finite- 
difference’ and discrete sensitivity derivatives for 
laminar and turbulent flow cases. The final section 
is a summary of the present work, and discusses fu- 
ture work in this area. 

Presentation of Theory 

Parallel Flux-Balance Computations 

The governing equations of 2-D compressible 
fluid flow considered in this study are the thin-layer 
Navier-Stokes equations written as 

1 dQ _ dF _ 5G dGl 1 __ ^ ^ 

J dt ~ d£ dr\ dr] 

The governing equations are solved computation- 
ally in their integral, conservation law form in 
generalized coordinates, using an implicit, upwind, 
cell-centered finite volume formulation. The invis- 
cid fluxes are discretized using Van Leer’s 14 flux- 
splitting scheme. The viscous fluxes are evaluated 
with second-order accurate central-differences. A 
nine-point stencil is used for higher-order accurate 
calculations of the inviscid and viscous fluxes. 

In a multiple-processor system, the discretized 
form of equation 1 is solved by dividing the sin- 
gle, large uniprocessor grid into a number of smaller 
grids (domain-decomposition) ; each grid is then as- 
signed to one individual processor. Note, that the 
memory-access for each processor in a distributed- 
memory environment is limited to the data resid- 
ing in its local-memory only. This implies that the 
parallel, multiple-processor calculations will be con- 
sistent with the origin al uniprocessor calculations 
only if information is exchanged across all grid in- 
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terfaces which are created because of the domain- 
decomposition. 

Since the computation of the residual vector R 
uses a nine-point stencil, the flux-evaluation for cell- 
faces which lie on (and adjacent to) domain bound- 
aries will require information from (a maximum of) 
two adjacent cells which reside in a neighboring pro- 
cessor. This information is stored in two layers of 
‘ghost’ cells at each domain boundary. Data from 
the neighboring domains is ‘communicated to these 
‘ghost’ cells, before the flux-evaluation routines are 
invoked. The implementation of boundary condi- 
tions at physical boundaries may also require com- 
munication amongst processors. In particular, air- 
foil calculations on C and O-type meshes require 
communication between non-neighboring processors 
in order to effect C and O-type periodicity. This 
is achieved by communication amongst processors 
which lie along the ‘cut’ of the particular C or O- 
type grid. Further details regarding parallelisation 
of the original CFD code are contained in refer- 
ence 13. 

A discrete numerical steady-state solution of 
equation 1 implies that 


{*«*)} = ( 0 } ( 2 ) 


where is the residual vector, for the steady- 

state solution {<?*}. The accuracy of {<?*} is di- 
rectly affected by the accuracy of computation of the 
inviscid and viscous fluxes. Equation 2 represents a 
large system of coupled, algebraic, non-linear equa- 
tions. An ‘implicit’ linearization of this non-linear 
system produces a linear system which can be solved 
directly by Newton’s root-finding method as 


. W . 


{ n A Q} = {R n (Q)} 


( 3 ) 


{Q n+1 > = {<?"} + { n AQ} (4) 

ns 1,2,3,... V 

In most CFD applications, [ dR FQ ^-] 1S a lar S e > 
sparse, banded matrix, which can be very cum- 
bersome to compute exactly (including consistent 
boundary-condition linearizations, true flux jaco- 
bians etc.). Even if the exact is available, 

the core memory required to invert this matrix re- 
stricts the practical application of exact Newton’s 
method to all but moderate sized 2-D problems. 

Hence, in practice, a quasi-Newton method is 
used to solve equation 3. An approximate matrix 

[®OT is constructed, by introducing lineariza- 
tion errors, adding an artificial ‘time-term’ to the 


main diagonal, and/or splitting the original opera- 
tor into simpler operators. The resulting ‘approxi- 
mate’ system of equations is 

-[^^] {n AQ } = { i? n( Q )} (5) 

This approximate linear system is then solved iter- 
atively for n AQ, followed by a solution update in 
equation 4. The tradeoff in using an approximate 
operator is the reduced error-reduction per time- 
step as compared to the exact Newton’s method. 
Note, that no approximation is made in computing 
{J? n (Q)} at each time-step, and that the system is 
solved in ‘delta’ or ‘incremental’ form. The ‘delta’ 
formulation ensures that the steady-state solutions 
obtained from the quasi-Newton method and the 
exact Newton method will be identical. 

In the parallel code, the calculation of inviscid 
and viscous fluxes is followed by ‘assembly’ of the 
implicit coefficient matrix. The coefficient matrix 
is ‘assembled’ from linear combinations of the in- 
dividual flux-jacobian matrices for each cell. Each 
domain computes its flux-jacobian matrices, and no 
extra communication is required to assemble the fi- 
nal coefficient matrix. This is because a five-point 
stencil is used to compute the implicit operator, 
which provides a sparse, banded, coefficient matrix 
with five block-diagonals. 

Thus, each processor (or domain) calculates its 
own implicit matrix and residual vector, and the 
original, large, system of linear equations corre- 
sponding to the uniprocessor domain is transformed 
to a series of smaller linear systems, with one linear 
system for each processor. In this paper, a precon- 
ditioned GMRES solver is used to iteratively solve 
each linear system of equations for each domain. 
Computationally, the GMRES algorithm involves 
basic linear algebra kernels, namely, inner-products 
of vectors, saxpy operations and matrix-vector prod- 
ucts. These kernels must be parallelized in order to 
obtain a parallel version of the GMRES solver. The 
parallel GMRES solver used in this paper has been 
validated to have the exact convergence rate of the 
serial GMRES solver 13 . 

If the implicit coefficient matrix provided to the 
GMRES solver lacks diagonal dominance (as is the 
case with the sensitivity equations), the solver con- 
verges very slowly to the solution of the correspond- 
ing linear system. The convergence rate of the solver 
can be improved by preconditioning the linear sys- 
tem. Preconditioning can greatly reduce the over- 
all computational effort required to solve the linear 
system. The Lower-Uppei Symmetric Gauss-Seidel 
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(LUSGS) scheme of Yoon and Jameson 15 is modi- 
fied into a pointwise-implicit block-solver, for use as 
a ‘local’ preconditioner in this work. This piecondi- 
tioner is applied individually to the linear system in 
each domain, and is found to be superior to the con- 
ventional preconditioners based on incomplete fac- 
torizations of the coefficient matrix 1 ®. 

Aerodynamic Sensitivity Equations 

In general, the j th aerodynamic system re- 
sponse, Cj is dependent on the vector of indepen- 
dent variables {<?*}, the vector of grid coordinates 
{X }, and the vector of design variables, {0}. This 
can be written as 

( 6 ) 

The sensitivity equations for any particular system 
response, Cj, can be obtained from equation 6 as 



Equation 7 represents the total rate of change of Cj 
with respect to (3k ■ 

The large system of non-linear equations which 
model the fluid flow (equation 2) can be generalized 
in the above vein and rewritten as 

{R{Q*(P),X{$)J)} = { 0> ( 8 ) 

The differentiation of equation 8 with respect to (3k 
yields 



This equation represents the so-called ‘standard’ 
form of the sensitivity equations. The equation 
is solved for the vector of sensitivity derivatives 
fox each design variable of interest, (3k . 
This method of computing sensitivi ty d erivatives is 
known as the Quasi- Analytical Method. 

The matrix ffST is the Jacobian of the non- 
linear equations (evaluated at steady-state including 
consistently linearized boundary conditions) with 
respect to the field variables. The discrete sensitiv- 
ity derivatives are represented by the vector 


which signifies the iotal change in the vector of field 
variables for a particular design variable, (3k » The 
matrix [fj] is the Jacobian of the flow equations 
(evaluated at steady-state) with respect to the grid 
coordinates ; { ^} is the grid- sensitivity vector and 
is computed by the method of Taylor et. al 5 . The 
vector { Jjp } accounts for the explicit dependencies 
(if any) of the flow equations (including boundary 
conditions) on/3*. 

Solutions for the ‘standard’ form sensitivity 
equations require a direct inversion of [§|] or it- 
erations with (a possibly poorly conditioned) [|^] 
as the coefficient matrix (similar to solving equa- 
tion 3). The standard form sensitivity equations 
are rewritten in ‘incremental’ form as 



The vector represents the mth iteration on 

the total derivative {^} > and must be driven to 

zero to find the solution °f equation 9. Note, 

that no approximations are allowed in the compu- 
tation of { } , in order that the converged solu- 

tion yields the correct, consistent, discrete sensitiv- 
ity derivatives^ 

The solution of equation 10 does allow ‘approx- 
imations of convenience’ for the left-hand-side coef- 
ficient matrix [|^]. In practice, the approximations 
are introduced by using a first-order discretization 
for the coeffi cient matrix, adding a pseudo ‘time- 
term’ to the main-diagonal and neglecting consis- 
tent linearization for ‘C’ and ‘O’ type boundary- 
conditions. The major advantage of solving the ‘in- 
cremental’ form of the sensitivity equations over the 
standard form is that any linear-system solver that 
is used in the analysis code to solve equation 5 can 
be used without modification to solve the system 
of sensitivity equations in equation 10. This is be- 
cause the characteristics (i.e. sparsity pattern and 
diagonal dominance) of the coefficient matrices in 
equation 5 and equation 10 are very similar to each 
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other. The solution of the ‘incremental’ form sen- 
sitivity equations, as derived by Korivi et. al 6 , has 
been parallelized in this work, and will now be de- 
scribed. 

Parallel Solution of Sensitivity Equations 

The computational work involved in solving the 
‘incremental’ sensitivity equations can be divided 
into two parts : 

(i) Calculate {^ 7 } from equation 12. Note that 
the matrix [§|] is computed in parallel (from 
the steady-state values of the vector Q*), and 
re-used at each iteration. The matrix-vector 
product, [|f] {^ 7 } is the onl y vector which 
needs to be recomputed at each iteration. This 
matrix- vector multiply needs to be parallelized 
across all the available processors. Note, that 
the exact jacobian matrix [|f ] has more non- 
zeroes than the approximate matrix [|f ] • This 
implies that parallel matrix-vector multiplica- 
tion with the exact jacobian matrix will require 
more operations than with the approximate 
jacobian matrix. The matrix-vector product 

[ff ] {^} and the vector {^) remain con " 

stant through the iterative process. Both these 
vectors are computed in parallel, and stored be- 
fore the iterative process begins. 

Each matrix-vector multiplication is preceded by 
inter-processor communication. This communica- 
tion is designed to provide each processor with up- 
dated values of { ^ } from the neighboring pro- 
cessors. This ensures that the ‘parallel’ matrix- 
vector product is identical to the ‘serial’ matrix- 
vector product. 

(ii) Solve the linear system of equation 10 by an it- 
erative method. As discussed earlier, the major 
motivation for developing the ‘incremental’ sen- 
sitivity equations is to use existing CFD solvers 
(e.g. spatially-split approximate factorization, 
Krylov solvers like GMRES etc.) to solve the 
linear system of equations in equation 10. This 
paper uses a preconditioned GMRES solver to 
solve this linear system. This parallel solver has 
been validated for solving the linear system in 
equation 5 , and is an integral part of the ex- 
isting CFD code 13 . In this work, the parallel 
GMRES solver incorporates a preconditioner 
derived from the approximate jacobian matrix, 
to accelerate convergence to the exact solution 

vector {i£}- 

Recall, that the ‘finite-difference’ sensitivity deriva- 
tives are evaluated by computing successive ‘per- 


turbed’ CFD solutions for each design variable of 
interest. For example, if Cj — Cl, the steady-state 
lift-coefficient, and fa = <*, the angle of attack, then 
can be approximated by 

dC L SCl _ Aa - eg- * a (13) 

“£T~ 6a - 2A a { ’ 

C£ +Aa and are obtained by computing new 

CFD solutions using the converged solution for C£ 
as an initial guess. In this paper, the parallel code 
of reference 13 is used to obtain the ‘perturbed’ so- 
lutions for the ‘finite-difference’ sensitivities, which 
are subsequently compared with the ‘discrete sen- 
sitivities obtained from the ‘incremental’ sensitivity 
equations. 

Test Results and Discussion 

A parallel, preconditioned GMRES solver has 
been developed to compute solutions of the sensi- 
tivity equations derived from differentiation of the 
discretized Navier-Stokes equations. The total com- 
putational work corresponding to the original single- 
processor domain is partitioned amongst the various 
processors of the parallel, distributed-memory ma- 
chine. The SPMD (Single Program Multiple Data) 
model of programming is invoked as each processor 
runs identical copies of the computational code, on 
different sets of data. 

The parallel code is developed on an Intel 
Hypercube with 32 processors. The results from 
the parallel sensitivity analysis code are validated 
against the original serial code (which is run on a 
single processor of a Cray YMP). The scalability of 
the domain decomposition algorithm and the pre- 
conditioned GMRES solver is tested by running the 
parallel code on a range of processors (8,16 and 32). 
The two problems selected for validation are low 
Reynolds number subsonic flow over a NACA 1406 
airfoil, and transonic turbulent flow over a NACA 
1406 airfoil. 

Laminar Flow — Subsonic Airfoil 

The parallel sensitivity equation solver is first 
validated for low Reynolds number subsonic flow 
over a NACA 1406 airfoil. The flow conditions cor- 
respond to a freestream Mach number of Afoo = 0.6, 
angle of attack, a = 1.0°, and Reynolds number, 
Re = 5.0* 10 3 . The computational grid is a ‘C’ mesh 
of 257*65 points, with points clustered near the air- 
foil surface and the far-field boundary placed five- 
chords from the airfoil surface. The lift-corrected 
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boundary conditions are implemented on the far* 
field boundaries 17 . 

The parallel, preconditioned GMRES solver is 
initially used to obtain a converged steady-state so- 
lution, {Q*}, to the discrete non-linear flow equa- 
tions (eqn. 2). The parallel validation for the GM- 
RES solver is performed on 8 nodes of the Hyper- 
cube with an 8 * 1 partitioning of the 257 * 65 domain 
to yield domains of size 33 * 65 for each process- 
ing node. The computed lift, drag, and pitching 
moment coefficients obtained for the steady-state 
solution are (7i,=0.1815, Cp =0.0417, and Cm — 
-0.0237. These coefficients compare exactly with 
those computed with the serial version of the code 
on a Cray Y-MP. 

The steady-state solution is used as an initial 
guess for the ‘finite-difference’ method (FDM), to 
compute sensitivity derivatives for C&, C& and Cm 
with respect to the an gle-of- attack <*, the freestream 
Mach Number ikfoo, and the Reynolds Number 
Re. The forward and backward perturbations for 
the ‘finite-difference’ calculations are set to A/?* = 
±5*10 _6 *^jb (for eqn. 13). A Courant number of 25, 
with (a maximum of) ten GMRES sub-iterations 
per time-step, is used to generate a new steady- 
state solution for each ‘perturbed’ condition. The 
l 2 norm of the global residual vector is reduced to 
a value of 10” 11 to determine the converged solu- 
tion for each ‘perturbed’ variable. A summary of 
the sensitivity derivatives computed by the parallel 
‘finite-difference’ method on 32 processors of the In- 
tel Hypercube is presented in table la. The sensitiv- 
ity derivatives obtained from the parallel, multiple- 
domain version of the FDM are identical to those 
obtained from the serial, single-domain version of 
the FDM on the Cray Y-MP. 

The number of iterations to convergence (n c ) 
for the FDM calculations are plotted in fig. 1. The 
values of n c for a and Re remain fairly constant as 
the number of processors increases. However, the 
calculations for (Mach No.) show an increase 
in n c as the number of processors increases from one 
(Cray Y-MP) through 8, 16 and 32 (Hypercube). 
This increase in n c may be attributed to an increase 
in stiffness of the coefficient matrix of eqn. 10, as the 
single-domain problem is partitioned into multiple- 
domain problems on the parallel machine. 

The processing times for the FDM calcula- 
tions on the Cray Y-MP (1 processor) and the In- 
tel Hypercube (8, 16, 32 processors) are summa- 
rized in fig. 2. The processing times shown do 
not include the time required to compute the ini- 
tial (unperturbed) steady- state solution. The to- 
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Figure 1. Iterations with FDM (Subsonic) 


tal time required to obtain all sensitivity derivatives 
for the three design variables on 32 parallel proces- 
sors is 1256 seconds, as compared to 778 seconds 
on the Cray Y-MP. Thus, the 32 processor Intel 
HypeTcube is 61% slower than the single-processor 
Cray Y-MP when calculating sensitivity derivatives 
by the FDM. Hence, it may be projected that 52 
(or more) parallel processors would be required to 
match (or exceed) the single-processor performance 
of the Cray Y-MP for the FDM calculations. 

PROCESSING TIME vs. PROCESSORS 



The parallel, preconditioned GMRES solver de- 
veloped for the original CFD code (and used to 
solve eqn. 5) is applied in the sensitivity analysis 
code to solve the ‘incremental’ sensitivity equations 
(eqn. 10). The steady-state solution, {Q*}, is used 


6 





in equation 8 to compute the right -hand-side vector 
for equation 9. The sensitivity derivatives 
obtained from solving the ‘incremental’ equations 
by the Quasi- Analytical Method (QAM) on the par- 
allel machine are summarized in table lb* The val- 
ues of the sensitivity derivatives in tables la and 
lb are identical to five decimal places. This vali- 
dates the accuracy of the parallel Quasi- Analytical 
Method for providing highly accurate sensitivity 
derivatives from solutions of the ‘incremental sen- 
sitivity equations (eqn. 10). 

The system of sensitivity equations is declared 
solved when the I 2 norm of the residual vector 
{^ 7 } “ reduced to 1(T 6 . A Courant number of 
25, with (a maximum of) ten GMRES sub-iterations 
are used to solve the linear system at each itera- 
tion. The variation in the number of iterations to 
convergence with the number of processors is plot- 
ted in fig. 3. As expected, the values of n c for the 
three design variables remain fairly constant as the 
number of processors increases. This demonstrates 
the scalability of the preconditioned GMRES solver 
when applied as a linear-system solver for solving 

the sensitivity equations by the QAM. 

ITERATIONS vs. PROCESSORS 



Figure 3. Iterations with QAM (Subsonic) 


A comparison of the processing times over var- 
ious processors for the quasi-analytical method is 
presented in fig. 4. The total time required on 32 
processors to obtain all sensitivity derivatives for all 
three design variables is 944 seconds, as compared 
to 667 seconds on the Cray Y-MP. Thus, a complete 
calculation of sensitivity derivatives by the QAM on 
32 processors of the Intel Hypercube requires 42% 
more processing time than a single processor Cray 
Y-MP. Assuming a linear speedup for the parallel 


QAM, 45 (or more) parallel processors would be 
required to match (or exceed) the single- processor 
performance of the Cray Y-MP. 

PROCESSING TIME vs. PROCESSORS 



A head-to-head comparisons of the parallel 
FDM and the parallel QAM reveals that the QAM 
on 32 processors (944 secs.) is 25% faster than the 
FDM on 32 processors (1256 secs.). As the problem 
size increases (for denser 2-D grids and 3-D grids) 
and the computational work per processor increases, 
the advantage of the QAM will further increase with 
respect to the FDM. This is because an increase in 
workload will cause a more significant increase in 
the ‘computation to communication ratio’ for the 
QAM than the FDM. 

Turbulent Flow — Transonic Airfoil 

This second test case demonstrates the com- 
putation of sensitivity derivatives for transonic tur- 
bulent flow over a NACA 1406 airfoil. The flow 
conditions correspond to Afoo = 0*8) a = 1*0°, and 
Re— 5.0*10 6 . A ‘C’ mesh with 257*65 points is used, 
with the far-field placed five chord-lengths from the 
airfoil surface. The clustering near the airfoil surface 
is much tighter than the previous (laminar) grid, 
in order to account for the higher Reynolds num- 
ber of the flow. The laminar viscosity is computed 
by Sutherland’s temperature law, and the turbulent 
viscosity is modeled by the algebraic model of Bald- 
win and Lomax 18 . 

A steady-state solution, {Q*}, is first obtained 
with the preconditioned GMRES solver on 32 pro- 
cessors of the Intel Hypercube. The computed 
lift, drag and pitching moment coefficients are 
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Cx,=0.4166, C D =0.7750 E-2, and C M = -0.4563 
E— 1. AH three coefficients compare identically with 
those computed by the serial code on a Cray Y-MP . 
This validates the accuracy of the parallel Baldwin 
Lomax turbulence model for this test case. 

The sensitivity derivatives of Cx,, Cd and Cm 
with respect to a, M©© and Re are first calculated by 
the ‘finite-difference’ method (FDM). The GMRES 
solver is used to obtain the ‘perturbed’ steady-state 
solutions from the unperturbed solution, {<?*}♦ The 
time-integration parameters for this test case are 
identical to those used in the subsonic test case. The 
sensitivity derivatives obtained by the FDM on 32 
processors of the parallel machine are summarised in 
table 2a. All the sensitivity derivatives are identical 
to the values obtained by serial calculations with 
the FDM on a single-processor Cray Y-MP. 

The convergence rate of the preconditioned 
GMRES solver is unaffected by the number of pro- 
cessors used in the FDM calculations. This is clearly 
evident from the plots in fig. 5. The scalability of 
the parallel GMRES solver as used in the FDM is 
thus validated for 32 processors. The processing 
time characteristics for the three design variables are 
shown in fig. 6. The total time is dominated by the 
Moo calculation, which is consistent with the results 
for the subsonic test case. The 32 processor parallel 
calculations (2792 secs.) are 55% slower than the 
equivalent single-processor Cray Y-MP calculations 
(1802 secs.), which implies that 50 (or more) parallel 
processors would match (or exceed) the Cray Y-MP 
performance. These projections are very similar to 
those made for the subsonic FDM calculations. 


ITERATIONS vs. PROCESSORS 



Number of Processors (N p ) 

Figure 5. Iterations with FDM (Transonic) 


PROCESSING TIME' vs. PROCESSORS 



Number of Processors (N p ) 

Figure 6. Proc. Time with FDM (Transonic) 

The ‘incremental’ sensitivity equations for this 
test case are solved by using the parallel precon- 
ditioned GMRES solver with a Courant number of 
25. The sensitivity derivatives computed with 32 
processors are listed in table 2b. These sensitiv- 
ity derivatives compare exactly with the sensitiv- 
ity derivatives obtained from serial calculations with 
the QAM, However, they do exhibit some discrep- 
ancies when compared with the parallel FDM calcu- 
lations. This is because the variation of laminar and 
turbulent viscosities with respect to the field vari- 
ables, {<?*}, and the computational grid, {X}, is 
neglected in the numerical construction of the vec- 
tor ~ } of equation 9. Hence, for turbulent flow 
cases, the ‘incremental’ sensitivity equations cannot 
provide the exact sensitivity derivatives ; the ‘finite- 
difference ’derivatives are more accurate in this case. 
This is true regardless of whether the sensitivity 
equations are solved on the serial or parallel ma- 
chines. 

The variation in the number of iterations with 
the number of processors for the QAM is plotted 
in fig. 7. It is clear that the number of iterations 
to convergence remains constant for any number of 
processors. This is an important result as it helps 
establish the scalability of the preconditioned GM- 
RES solver for parallel sensitivity derivativecalcula- 
tions with the QAM. 

The processing times for the three design vari- 
ables are plotted in fig. 8. The sensitivity deriva- 
tivecalculations for Re require the maximum pro- 
cessing time, which is consistent with the results for 
the subsonic test case. The parallel calculations on 
32 processors (2140 secs.) are 39% slower than the 
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ITERATIONS vs. PROCESSORS 
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Figure 7. Iterations with QAM (Transonic) 


serial Cray Y-MP calculations (1540 secs.). This re- 
sult compares excellently with the processing tune 

characteristics for the subsonic test case. 

PROCESSING TIMl vs. PROCESSORS 



A comparison of the parallel FDM and paral- 
lel QAM calculations reveals that the latter (2140 
secs.) is 24% faster than the former (2792 secs.). 
This observation is identical to that made for the 
subsonic test case, and reinforces the fact that 
the parallel preconditioned GMRES solver performs 
uniformly for sensitivity derivativecalculations for 
subsonic and transonic flow conditions. 

Conclusions and Future Work 


number of parallel processors, is successfully de- 
signed and tested for obtaining sensitivity deriva- 
tives of the Navier- Stokes equations on a distributed 
memory parallel machine. The solver is based on a 
‘locally' preconditioned GMRES algorithm and is 
constructed with a general domain-decomposition 
strategy in an SPMD programming framework. 

All tests conducted on a 32 processor Intel Hy- 
percube indicate that the parallel GMRES solver 
provides consistent and accurate sensitivity deriva- 
tives for both low Reynolds number (laminar) and 
high Reynolds number (turbulent) flows. The ac- 
curacy of the computed sensitivity derivatives is 
found to be independent of the number of proces- 
sors, for both flow conditions tested in this paper. 
The finite-difference method of calculating sensitiv- 
ity derivatives is found to be more accurate than 
the quasi-analytical method, particularly for high 
Reynolds number (turbulent) flows. The quasi- 
analytical method of calculating sensitivity deriva- 
tives is 25% more efficient than the finite-difference 
method, in terms of processing time. The parallel 
processing times for both the low and high Reynolds 
number test cases indicate that 40-50 parallel pro- 
cessors of an Intel Hypercube would match the per- 
formance of a Cray Y-MP. The parallel, precondi- 
tioned GMRES solver exhibits similar processing 
time characteristics and scalability when calculat- 
ing sensitivity derivatives for both the laminar and 
turbulent flow cases. 

In future work, the procedure for obtaining sen- 
sitivity derivatives developed in this paper will be 
tested on larger parallel machines. This will be 
done to further study the scalability of the code, 
and the effectiveness of the parallel solver on large 
numbers of processors. The sensitivity analysis code 
will also be ported to a ‘cluster’ of workstations, in 
order to study its performance characteristics in a 
loosely coupled parallel environment. The feasabil- 
ity of obtaining sensitivity derivatives in a paral- 
lel environment by automatic differentiation of the 
Navier-Stokes equations with a software package like 
ADIFOR 19 , will also be investigated. 
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Table la. Subsonic Airfoil ; Finite- Difference Method 


(3k 

dC L /d(3 k 

dC M /dp h 

dCp/dfik 

a 

6.1218E+0 

9.1815E— 2 

— 3.1690E— 2 

Moo 

5.4302E— 3 

1.6279E— 2 

— 4.7328E-3 

Re 

5.9580E— 6 

— 4.9120E— 6 

— 6.5630E— 7 


Table lb. Subsonic Airfoil ; Quasi-Ajialytical Method 


(3k 

dCxj/dpk 

dCjtf l dftk 

dCn/dfik 

a 

6.1218E+0 

9.1813E— 2 

— 3.1675E— 2 

Moo 

5.4248E-3 

1.6279E— 2 

— 4.7296E— 3 

Re 

5.9577E-6 

— 4.9123E— 6 

— 6.5637E— 7 


Table 2a. Transonic Airfoil ; Finite- Difference Method 


(3k 

dCL/dfik 

dCMldfik 

dCo/dfik 

a 

1.2976E+1 

4.3337E— 1 

— 6.2317E— 1 

Moo 

2.0293E+1 

1.9710E— 1 

— 5.9554E— 1 

Re 

— 1.1112E— 9 

— 2.8051E— 10 

1.4250E— 10 


Table 2b. Transonic Airfoil ; Quasi-Analytical Method 


(3k 

dCi,/dPk* 

dCM/dfik* 

dCo/dfik* 

a 

1.1981E+1 

4.1926E— 1 

— 4.6152E— 1 

Moo 

1.7419E+0 

1.9215E— 1 

— 5.3973E— 1 

Re 

— 6.4846E— 9 

— 7.3551E— 10 

1.3584-9 
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